Variational Autoencoder for Deep Learning of Images, Labels and Captions

نویسندگان

  • Yunchen Pu
  • Zhe Gan
  • Ricardo Henao
  • Xin Yuan
  • Chunyuan Li
  • Andrew Stevens
  • Lawrence Carin
چکیده

A novel variational autoencoder is developed to model images, as well as associated labels or captions. The Deep Generative Deconvolutional Network (DGDN) is used as a decoder of the latent image features, and a deep Convolutional Neural Network (CNN) is used as an image encoder; the CNN is used to approximate a distribution for the latent DGDN features/code. The latent code is also linked to generative models for labels (Bayesian support vector machine) or captions (recurrent neural network). When predicting a label/caption for a new image at test, averaging is performed across the distribution of latent codes; this is computationally efficient as a consequence of the learned CNN-based encoder. Since the framework is capable of modeling the image in the presence/absence of associated labels/captions, a new semi-supervised setting is manifested for CNN learning with images; the framework even allows unsupervised CNN learning, based on images alone.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Variational Autoencoder for Deep Learning of Images, Labels and Captions: Supplementary Material

Table 1: Semi-supervised classification accuracy (%) on the validation set of ImageNet 2012. Proportion 1% 5% 10% 20% 30% 40% top-1 AlexNet 0.1± 0.01 11.5 ± 0.72 19.8 ± 0.71 38.6 ± 0.31 43.23 ± 0.28 45.85 ± 0.23 GoogeLeNet 4.75± 0.58 22.13± 1.14 32.18± 0.80 42.83± 0.28 49.61± 0.11 51.90 ± 0.20 BSVM (ours) 43.98± 1.15 47.36± 0.91 48.41± 0.76 51.51± 0.28 54.14± 0.12 57.34± 0.18 Softmax (ours) 42....

متن کامل

Combining sequential deep learning and variational Bayes for semi-supervised inference

In the application of machine learning to time series, coarse labelling of the sequence is generally known. However, often a finer granularity of the annotations is sought for improved accuracy and resolution in signal analysis. With a focus on medical time series, this study employs a bidirectional LSTM autoencoder, t-SNE, and variational Bayesian estimation in order to provide labels with fin...

متن کامل

Generating Images with Perceptual Similarity Metrics based on Deep Networks

Image-generating machine learning models are typically trained with loss functions based on distance in the image space. This often leads to over-smoothed results. We propose a class of loss functions, which we call deep perceptual similarity metrics (DeePSiM), that mitigate this problem. Instead of computing distances in the image space, we compute distances between image features extracted by...

متن کامل

GraphVAE: Towards Generation of Small Graphs Using Variational Autoencoders

Deep learning on graphs has become a popular research topic with many applications. However, past work has concentrated on learning graph embedding tasks, which is in contrast with advances in generative models for images and text. Is it possible to transfer this progress to the domain of graphs? We propose to sidestep hurdles associated with linearization of such discrete structures by having ...

متن کامل

Graphvae: towards Generation of Small Graphs Using Variational Autoencoders

Deep learning on graphs has become a popular research topic with many applications. However, past work has concentrated on learning graph embedding tasks only, which is in contrast with advances in generative models for images and text. Is it possible to transfer this progress to the domain of graphs? We propose to sidestep hurdles associated with linearization of such discrete structures by ha...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016